FLAW: FaiLure-Aware Workflow Scheduling in High Performance Computing Systems

نویسندگان

  • Zhifeng Yu
  • Chenjia Wang
  • Weisong Shi
چکیده

The goal of workflow application scheduling is to achieve minimal makespan for each workflow. Scheduling workflow applications in high performance computing environments, e.g., clusters, is an NP-Complete problem, and becomes more complicated when potential resource failures are considered. While more research on failure prediction has been witnessed in recent years to improve system availability and reliability, very few of them attack the problem in the context of workflow application scheduling. In this paper, we study how a workflow scheduler benefits from failure prediction and propose FLAW, a failure-aware workflow scheduling algorithm. Furthermore, we propose two important definitions on accuracy, Application Oblivious Accuracy (AOA) and Application Aware Accuracy (AAA), from the perspectives of system and scheduling respectively, as we observe that the prediction accuracy defined conventionally imposes different performance implications on different applications and fails to measure how that improves scheduling effectiveness. The comprehensive evaluation results using real failure traces show that FLAW performs well with practically achievable prediction accuracy by reducing the average makespan, the loss time and the number of job rescheduling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Green Energy-aware task scheduling using the DVFS technique in Cloud Computing

Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...

متن کامل

A Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints

One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...

متن کامل

Data Replication-Based Scheduling in Cloud Computing Environment

Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...

متن کامل

An Efficient Genetic Algorithm for Task Scheduling on Heterogeneous Computing Systems Based on TRIZ

An efficient assignment and scheduling of tasks is one of the key elements in effective utilization of heterogeneous multiprocessor systems. The task scheduling problem has been proven to be NP-hard is the reason why we used meta-heuristic methods for finding a suboptimal schedule. In this paper we proposed a new approach using TRIZ (specially 40 inventive principles). The basic idea of thi...

متن کامل

An agent based network resource planner for workflow applications

Many scientific workflow applications are driven by simulation generated data, or data collected from sensors or instruments, and the processing of the data is commonly done at a different location from where the data is stored. Moving large quantities of data among different locations is thus a frequently invoked process in scientific workflow applications. These data transfers often have high...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007